Combining Document Representations
نویسنده
چکیده
This paper presents a formal framework for the combination of document representations based on evidential reasoning. Each indexing method is modelled by an agent referred to as an indexer. Indexing elements re modelled as sentences which are used to describe the content of a document. The modelling of the indexing and its uncertainty provides the document representation. The combination of document representations is expressed as the combination of the indexing and uncertainty as provided by two or more indexers. The resulting indexer is referred to as the combined indexer. The proposed framework allows the capture of the semantics of the indexing vocabularies associated with the indexers and the aggregation of the uncertainty associated with the indexing.
منابع مشابه
Learning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملCombining Text Vector Representations for Information Retrieval
This paper suggests a novel representation for documents that is intended to improve precision. This representation is generated by combining two central techniques: Random Indexing; and Holographic Reduced Representations (HRRs). Random indexing uses co-occurrence information among words to generate semantic context vectors that are the sum of randomly generated term identity vectors. HRRs are...
متن کاملUsing Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization
This paper investigates the use of conceptbased representations for text categorization. We introduce a new approach to create concept-based text representations, and apply it to a standard text categorization collection. The representations are used as input to a Support Vector Machine classifier, and the results show that there are certain categories for which concept-based representations co...
متن کاملTowards robust methods for spoken document retrieval
In this paper, we investigate a number of robust indexing and retrieval methods in an effort to improve spoken document retrieval performance in the presence of speech recognition errors. In particular, we examine expanding the original query representation to include confusible terms; developing a new document-query retrieval measure based on approximate matching that is less sensitive to reco...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Int. J. Cooperative Inf. Syst.
دوره 9 شماره
صفحات -
تاریخ انتشار 2000